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Abstract. We consider the uniform distribution of solutions (x, y) to xy = N 
mod a, and obtain a bound on the second moment of the number of solutions 
in squares of length approximately a 1 / 2 . We use this to study a new factoring 
algorithm that factors N = UV provably in 0(N 1 ^ 3+C ) time, and discuss the 
potential for improving the runtime to sub-exponential. 



1. Introduction 

Let gcd(a,A) = 1. A classic application of Kloosterman sums shows that the 
points (x, y) mod a satisfying xy — N mod a become uniformly distributed in the 
square of side length a as a — > oo. In this paper we investigate an application of 
this fact to the problem of factoring integers. We give a new method to factor the 
integer TV which beats trial division, and prove that it runs in time 0(iV 1 / 3+e ). 

While the complexity of our method is not exciting, considering the existence 
of several probabalistic sub-exponential factoring algorithms, the runtime here is 
provable and does compete favourably with the best known provable factoring al- 
gorithm, Pollard-Strassen, which only runs in time 0(N 1 / 4+e ). Shank's class group 
method runs in time 0(N^ 5+t ) assuming the GRH. The algorithm is described in 
Section H 

Furthermore, proving this runtime requires understanding the finer distribution 
of solutions to xy — N mod a, and our results in this regards are interesting in 
their own right. We discuss the problem on uniform distribution in Sections [4] and [5] 

Finally, all existing sub-exponential factoring algorithms have grown out of much 
weaker exponential algorithms, and we hope that the factoring ideas presented here 
will be improved. In Section [3] we discuss some needed improvements to achieve a 
better runtime. 



2. Algorithm- hide and seek 

Let N be a positive integer that we wish to factor. Say N = UV where U and 
V are positive integers, not necessarily prime, with 1 < U < V . For simplicity, 
assume V < 2U, so that V < (2N) 1 / 2 . The general case, without this restriction, 
will be handled at the end of this section. 
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The idea behind the algorithm is to perform trial division of N by a couple of 
integers, and to use information about the remainder to determine the factors U 
and V. 

Let a be a positive integer, 1 < a < N. By the division algorithm, write 

U = u\a + Mo, with < uq < a 

(2.1) V = via + vq, with < Vq < a. 

Assume that uq is relatively prime to a, and likewise for vq, since otherwise one 
easily extracts a factor of N by taking gcd(a, N). If, for a given a, we can determine 
Uq, ui, vq, vi then we have found U and V. 

Consider N = u v n mod a. One cannot simply determine u and v from the 
value of N mod a since (f>(a) pairs of integers (x, y) mod a satisfy xy — N mod a 
(if x = muo mod a, then y = m~ 1 v mod a, where gcd(m, a) = 1). 

However, say a is large, a = \(2N) 1 ^ 3 ~\ > V 2 ^ 3 , so that v\ and u\ are compara- 
tively small, u\,vi < V 1 / 3 , i.e. both are < a 1 / 2 . If we consider TV mod a — 5 

(2.2) TV = UV = {u\5 + uq)(v\5 + vq) mod a - 6, 

for S = 0, 1, we get, as solutions (x,y) to xy — N mod a — 5, two nearby points, 
(t*o, vo) and (uo + Ui,vo + v\), whose coordinates are within a 1 / 2 of one another. 

So, one divides the cartesian plane into squares of side length a 1 / 2 , and for 
S = 0, 1 lists all (f>(a — 5) pairs of integers (x, y) mod a — 5 that satisfy xy = N 
mod a — (5, throwing them into the appropriate squares in the cartesian plane. We 
can assume that gcd(a — S,N) = 1, because, otherwise we easily extract a factor of 
N. The computational time needed to do so is 0(a) since computing all inverses 
mod a can be done in 0(a) operations (start with m = 2, multiply mod a by m 
until we arrive at 1, or hit a residue class already encountered. Then, take the first 
residue not hit by the powers of m and repeat the previous step until all residue 
classes are exhausted). 

Now there are <f)(a)+<fi(a— 1) = 0(a) points (x, y), for 5 = 0, 1 and they all satisfy 
< x, y < a. These points are uniformly distributed in the large square of side 
length a (See Section 3). This large square is partitioned, as above, into roughly 
a smaller squares of side length a 1 / 2 and so we expect most squares to contain 
0(1) points (this is made more precise in Sections 3 and 4). At the topmost and 
rightmost edges the squares will be truncated, unless a 1 / 2 is an integer. 

The two points we wish to find, (uq, v ) and (uo + u\, Vo +vi), which are included 
amongst our exhaustive list of points, all fall within the same square of side length 
a 1 / 2 , or, at worst within two neighbouring squares (when comparing neighbouring 
squares at the rightmost or topmost border of the large square one wraps around 
to the other side of the square and adds a — 1 to the appropriate coordinate). So, 
one scans across all such squares and their immediate neighbours in 0(a) time, 
looking at all pairs of points contained in said squares. Each pair of points gives us 
candidate values for Uo,vq and ui,v\, and we check to see whether they produce 
TV = (uia + uo)(via + vq) . Since most squares are expected to contain 0(1) points, 
the overall time to check all squares and points is predicted to be 0(a). In Section 
4 we obtain a bound of 0(a 1+e ). This algorithm terminates successfully when the 
true points (uo, Vo) and (uo + Ui,Vo + v\) are found. Since a = 0(N 1 ' 3 ) this gives 
a running time that is provably 0(N 1 ^ 3+e ). 
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The idea that lies behind the algorithm suggests the name 'Hide and Seek'. The 
solutions that we seek (uo,vo) and (uq + ui,vq + vi) are hiding amongst many 
solutions in the large a x a square, but, like children who have hidden next to one 
another while playing the game Hide and Seek, they have become easier to spot. 

The storage requirement of 0(N 1 ^ 3 ) can be improved to 0(N 1 ^ 6 ) by generating 
the solutions (x, y) to xy = N mod a — 8 lying in one vertical strip of width 
0{a}l 2 ) at a time (easy to do since we are free to choose x as we please, which 
then determines y). In general, we are then no longer free to generate all modular 
inverses at once, and must compute inverses in intervals of size a 1//2 , one at a time, 
at a cost, using the Euclidean algorithm, of 0(a e ) per inverse. 

2.1. Variant. 1 < U < V < N without restriction. Say U = N a , V = iV 1 "". 
We may assume that a > 1/3, for, if not, we can find U in 0{N 1 / 3 ) time by trial 
division. 

Let a = [TV 1 / 3 ]. Instead of working with small squares of side length a 1 / 2 , 
partition the ax a square into rectangles of width w of size TV" -1 / 3 and height h of 
size A rl ~ Q_1 / 3 . This choice of w and h is needed to make sure that, using the same 
notation as before, (uq, vq) and (tto + u\, vq + v±) are in the same or neighbouring 
rectangles. Since we do not, apriori know a, we can perform these steps on an 
exponentially increasing set of to's, for example starting with w = 2, and, should 
these steps fail to factor N, doubling the size of to, until w > TV" -1 / 3 and one 
successfully factors N. We can set h = TV 1 / 3 /w. 

The area of each rectangle is of size iV 1 / 3 , and of the ax a square is approximately 
TV 2 / 3 , so there are O^N 1 / 3 ) rectangles (at the top and right edges these will typically 
be truncated), and most are expected to contain 0(1) solutions to xy = N mod a— 
S. Running through each rectangle and its immediate neighbours, checking all pairs 
of points in these rectangles suggests a running time of 0(N 1 ^ 3 ) for a particular 
choice of to and h. Since we might have to repeat this a few times, doubling the 
size of to, the overall running time gets multiplied by O(logTV) which is 0(N e ). 

In Section 4.2, a running time equal to 0(N 1 / 3+€ ) is proven. 

3. Towards a subexponential bound 

The above algorithm exploits the fact that when a is large, and S is small, the 
points with coordinates (U, V) mod a — 5 are close to one another. In fact they 
lie equally spaced on a line with common horizontal difference u\, and vertical 
difference v\. 

An obvious thing to try is to reduce the size of a. However, as a decreases, u\ 
and V\ increase so that not only do the points (uq, vq) and (uq + ui,vq + v\) move 
far apart, the latter point soon falls far outside the square of side length a. 

To fix this, one can view (|2.ip as the base a expansion of U and V. When a is 
smaller, one could instead use a polynomial expansion 

U = Ud 1 a dl + . . . + u\a + uq, < Ui < a 
(3.1) V = v d2 a d2 + . . . + via + v , < Vi < a, 

with ttdi 7^ and Vd t ^ 0. For simplicity in what follows, assume that the degrees 
of both polynomials are equal, d\ = di = d, so that both U and V satisfy a d < 
U,V < a d+1 . 
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A polynomial of degree d is determined uniquely by d + 1 values. Imitating the 
approach in Section 1, we evaluate N mod a — S for d + 1 values of S. A natural 
choice would be S = 0, ±1, ±2, . . ., but, to keep everything positive, we consider 
5 = 0, 1,2,..., d. Now, 

(3.2) N = UV = (u d S d + . . . + u 1 S + u )(v d S d + . . . + v x S + v ) mod a - S. 
Since < u } < a, we have 

(3.3) u d 5 d + ... + u x 5 + uo < a\(d,S) 
where 

(3.4) \(d,5)=5 d + 5 d - 1 + ... + l = (5 d+1 -l)/(6-l)~5 d , as 5 -> oo. 

and similarly for the Vj's. 

For each 5 one lists all solutions (x, y) to 

(3.5) xy = N mod a — S 

(3.6) < x,y < a\(d,5). 

The number of points (x,y) for a given S is </>(a — S) per a x a square, and hence, 
overall, equals 

(3.7) <f>(a-5)\(d,5) 2 =0(ad 2d ). 

We are again assuming that gcd(a — S,N) = 1, otherwise one easily pulls out a 
factor of N. 

We need a method to recognize the solutions that we seek [ud5 d + . . . + Uo, v d S d + 
. . . + vq) hiding amongst all the (x, y)'s. This leads to the question: 

Let X > and let So, Si, . . . , S d be d + 1 sets of points e all of whose 
coordinates are positive and < X. Assume that amongst these points there exists 
d + 1 points, one from each Sg, whose coordinates are described by polynomials 
u(8), v(S) £ Z[S] of degree d. More precisely, for each < 8 < d there exists a point 
( x s, ys) £ Sg such that 

xg = u(S) = u d 5 d + . . . + u 

(3.8) yg = v(S) =v d 6 d + ... + v . 

Can one find these d+1 points much more efficiently than by exhaustively searching 
through all possible d+1 tuples of points? For example, can one find these points 
in time, 0(X a d fJd ) for some a, fi > 0? 

In our application, X = 0(ad 2d ). Since N = UV and a d < U < V < a d+1 , 
we have a < N 1 ^ 2d \ Assuming that there is an 0(X a d f3d ) algorithm for finding 
points with polynomial coordinates, on taking d proportionate to 

log TV xV2 



(3.9) 

VloglogiV, 
one gets a factoring algorithm requiring 

(3.10) exp (jQogAloglogiV) 1 / 2 ) 

time and storage, for some 7 > 0. 

One can cut back a bit on the search space, by noting, for example, that the 
coefficients of u(S) and v(S) are integers (this imposes a divisibility restriction on 
finite differences between points lying on the polynomial), and, in our particular 
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application, that the coefficients are non-negative and bounded, and this restricts 
the rate of growth of the polynomials. However, to get down to a running time 
polynomial in X, one needs to do much better. 

More speculatively, to reduce the running time below exp (7 (log N log log TV) 1 / 2 ) , 
we might try to reduce the size of the degree of the polynomials without increasing 
o, perhaps by choosing S's that satisfy certain congruence relations mod a — 5. 

4. Uniform distribution 

Let gcd(a,7V) = 1. A classic application of Kloosterman sums shows that the 
points [x, y) mod a satisfying xy = N mod a become uniformly distributed in the 
square of side length a as a — > 00. While the tools used in this section are fairly 
standard, they will also be applied in the next section to estimate the running time 
of the Hide and Seek algorithm. Similar theorems can be found in the literature [T] 
[| U |] U, often with restrictions to prime values of a or to N = 1. 

Consider the following identity which detects pairs of integers (x, y) such that 
xy = N mod a: 

1 v—v (k _ \ jl if xy = N mod a 



(4.1) -> e[-(y-xN), 
a f—' \a I otherwise 

k— V. 

where e{z) = exp(27riz), and where x stands for any integer congruent to x^ 1 
mod a, if the inverse exists. Recall that we have assumed gcd(a, TV) = 1 so that 
any solution to xy = N mod a must have gcd(x, a) = 1. Thus, for such solutions, 
x^ 1 mod a exists. 

Let R be the rectangle bounded horizontally by xi,%2 S ^ and vertically by 
2/2 £ ^, where < x\ < X2 < a and < y\ < yi < a: 

(4.2) R = R(xi,x 2 ,yi,y2) = {(x,y) e Z 2 \xi < x < x 2 , yi < y < y 2 }- 

Let cr(N, a) denote the number of pairs of integers (x, y) that lie in the rectangle 
i?, and satisfy xy — N mod a: 

(4.3) c R (N,a)= L 

(x,y)eR 
xy=N mod a 

The identity above gives 

(4.4) c R (N,a) = ±J2 Yl e(^(y-xN)Y 

gcd(x,a) = l 

Notice that we only need to restrict x to gcd(x, a) = 1 and that y runs over all 
residues in y\ < y < y 2 . This will allow us to deal with the sum over y as a 
geometric series. 

The k = term provides the main contribution while the other terms can be 
estimated using bounds for Kloosterman sums. We require two lemmas. The first 
considers the main contribution, and the second bounds the remaining terms. 

Lemma 4.1. The k — term in equals 

, , ^ area(R) ., . _ . 
4.5 +° a 

for any e > 0. 
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Proof. The k = term is 

(«) J E 1 



(2/2 - yi ) 



l>,i)es 

gcd(x,a) = l 



E i. 

3)1 < a; < 3^2 
gcd(x,a) = l 



Using the Mobius function we have 

E 1 - E E ^=EmM E 1 

<x 

(X, 

(4.7) 



gcd(:c,a) — 1 



= $>(d)((*a - an)/d+ 0(1)) = (x 2 - a*) - l/p) + 0(r(o)), 

where r(a) equals the number of divisors of a and is 0(a e ) for any e > 0. This 
implies that the k = contribution to cr(N, a) equals 

(4.8) 



(«) + 0((y 2 -2/i)a- 1+e ) 

a, - 

which gives the lemma. 

The next lemma bounds the contribution of the k > 1 terms in (|4.4p 
Lemma 4.2. for any e > we have 



□ 



(4.9) 



~E E e(VW)=0(aV^) 



fc=l (.,,)£S 
gcd(ic,a) = l 



Proof. One can separate the sum over y and evaluate it as a geometric series ob- 
taining for the above expression 



(4.10) 



k=l 



1 



E 



s 1 <:b<x2 
cd(s,a) = l 



Taking absolute values we get an upper bound of 
(4.11) 



fc=i 



I sinful 



E 



33 1 <Cx2 
gcd(x,a) = l 



Next, notice that the terms k and a — k give the same contribution, so we may 
restrict our attention to just the terms 1 < k < (a — l)/2. If a — 1 is odd, the 
middle term is left out at a cost of O(l), and the bound becomes 



(4.12) 



E 

l<fc<(a-l)/2 



E 



gcd(x,a) = l 



-xN 



0(1). 



The second sum above over x can be expressed in terms of Kloosterman sums, 
and using estimates for Kloosterman sums one has 



(4.13) 



e (—xN) = 0(a 1 / 2+£ gcd(fc, a 

■ 1<.<.2 V a ' 



)l/ 2) 
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For a proof, see Lemma 4 on page 36 of Hooley's book [4] where a proof is given 
(his r corresponds to our a, and his I is —kN. Also recall that we are assuming 
gcd(A, a) = 1 so that N does not appear in the gcd of the O term). 

Furthermore, using the Taylor expansion of sin(x) one obtains the two inequali- 
ties 

sin(a;) < min(a:, 1), x > 0, 

(4.14) l/sin(a:) < 2/x, < x < tt/2. 

For the second inequality, use < x/2 < x — a; 3 /3! < sin(x) in the stated interval. 
Applying (|4~T4|) and (|4~13l) gives an upper bound for (|4.12D of 

(4.15) ola- 1 ^ Y, min(^(y 2 -yi),l)^gcd(fc,a) 1/2 + l 

\ 1<*<(o-1)/2 ' 

Breaking up the sum into 1 < k < a/(n(y2 — yi)) and a/(n(y2—yi)) < k < (a— 1)/2, 

the sum over k in the O term equals 

(4.16) 

2(lfc-lft) E gcd(fc,a) 1/2 + ^ £ gcd(fc, a) 1/2 A- 

l<k<a/(-7r(y 2 -vi)) a/(ir(y 2 -yi))<k<(a-l)/2 

Both kinds of sums can be easily handled (the first can also be found in Hooley). 
Let X > 0. Then, 

(4.17) 2 gcd(M) 1/2 <5>V2 £ 1<IE^ 1/J =0(I4 

Kfe<X dla i<^<x d i a 



Next, let < X x < X 2 . Then 

E gcd^a) 1 / 2 /^^^ 2 E v* 

Xi<fc<X 2 d| a XKfc<x 2 

(4.18) =0 log(X 2 - + 2) E ^" 1/2 

\ d|a 

which equals 

(4.19) 0{log(X 2 -X 1 +2)a e ). 
Applying P~P7|) and $EW§ to (|S7TB|> . we have that <@Tfj§ is 

(4.20) 0(a 1 / 2+e ), 

completing the proof. □ 

These two lemmas together give the following theorem. 

Theorem 4.3. Let gcd(N,a) = 1 and R as described in {4-<ty - Then, cu(N,a), 
the number of solutions (x, y) to xy = N mod a with (x, y) lying in the rectangle 
R, is equal to 

(4.21) HZW+to + oiaW) 
for any e > 0. 
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This theorem shows that the points (x, y) satisfying xy = N mod a are uni- 
formly dense in the sense that the rectangle R contains its fair share of solutions, 
so long as the area of R is of larger size than et 3 / 2+e . 

For example, if R is a square, it needs to have side length at least a 3 / 4+c to 
contain its fair share of points. This is considerably larger than the side length of 
a 1 / 2 that is used in the algorithm of Section 1. 

The paper of Shparlinski [6] contains many references to the problem of uniform 
distribution and discusses improved results on average over N . 

5. Second moment and running time 

We now examine the assertion made in Section 1 that 0(a 1+e ) time is needed 
to scan across all a squares of side length a 1 / 2 and their immediate neighbours, 
comparing all pairs of points contained in said squares. 

The running time of the algorithm in Section 1.1, i.e. in the case U < V < 2U 
depends on how the solutions to xy = N mod a and x'y' = N mod (a — 1) are 
distributed amongst the small squares of side length a 1 / 2 . In Section 4.2 we will 
consider the running time of the variant in Section 1.2 which is used for the general 
situation 1 < U < V < N. 

Let S denote one such square. Then the running time needed to examine just the 
square S, looking at all pairs of points {x,y), (x' , y 1 ) in S is 0(cs(N, a)cs(N,a — l)), 
which, by the arithmetic geometric inequality is 0(cs(N, a) 2 + cs(N, a — l) 2 ). The 
algorithm also requires us to compare points in neighbouring squares, say Si and 
S2, which, similarly, takes 0(cs 1 (N,a) 2 + cs 2 {N,a — l) 2 ) time. Hence, the overall 
running time to compare pairs of points is 




the sum being over the roughly a squares of side length a 1 ! 2 that partition the ax a 
square {(x,y) G Z 2 |0 < x, y < a} (at the top and right edges we get rectangles, 
unless a 1 ! 2 is an integer). 

Consider now the contribution from the points mod a: 

(5.2) J2 c s(N,a) 2 . 

s 

For convenience, rather than deal with squares S of sidelength a 1 / 2 , we make a 
small adjustment and partition the a x a square into squares B of sidelength 

(5.3) b= [a 1 / 2 ]. 

We also assume that gcd(6, a) = 1. If not, replace b with 6+1 until this condition 
holds. By equation (14. 7p . this will not take long to occur, so that, for any e > 0, 
b = a 1 / 2 + 0{a e ). 
The squares B are 

(5.4) B = B i:j = {(x,y) € 1 2 \ib < x < (i + l)b,jb < y < (j + l)b} 

with < i, j < a/b - 1. 

Since b \ a , these will not entirely cover the a x a square, but the number of 
points (x, y) £ 1? satisfying xy — N mod a that are neglected at the right most 
and top portions of the a x a square is, by (|4.7|) . 0(4>(a)b/a), and these therefore 
contribute 0((f>(a) 2 b 2 /a 2 ) = 0{<j){a)) to ([53]) . 
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The points (x, y) G 1? belonging to an a 1 / 2 x a 1 / 2 square S are contained entirely 
in at most four squares, say Bi 1 j 1 , Bi 2 j 2 , Bi 1 j 3 , Bi i j i , of sidelength b. Therefore, 

(5.5) c s (N, a) 2 < (c Biiji (N, a) + c B ^ (N, a) + c B ^ (N, a) + c B ^ (N, a)) 2 
which, by the Cauchy Schwartz inequality is 

(5.6) < i(c Bilh (N, a) 2 + c B%2J2 (N, a) 2 + c B ^ 3 (N, a) 2 + c B ^ (N, a) 2 ). 
Since each B square overlaps with 0(1) S squares, we thus have that 

(5.7) £ Cs (iV, a ) 2 = 

s 

the (f>(a) accounting for the contribution from the neglected portion at the right 
most and top portions of the a x a square. 

A similar consideration for the points satisfying xy — N mod (a — 1), partition- 
ing the larger a x a square into squares D of sidelength d, where d is the smallest 
integer greater than \{a — l) 1 ^ 2 ] which is coprime to a, gives the same kind of sum 

(5.8) 5>s(^a-l) 2 = 0(<Ma-l) + ^ Cz5 (A^a-l) 2 ) ■ 

S V D J 

Therefore, we need to estimate the second moment 

(5.9) ^c s (iV,a) 2 

B 

where B ranges over all [a/6J 2 squares of the form (|5.4[) . To prove that the running 
time of the hide and seek algorithm of Section 1 is 0(N 1 ' 3+e ) we need to prove 
that HH) is 0(a 1+e ). 

Theorem 5.1. Let 

(5.10) P = {Bij} <i ,j<a/b-l 
Then 

(5.11) ^ CB (7V, a ) 2 = 0(a 1+e ). 

BeP 

Proof. Rather than look at just the ax a square, it is helpful to consider the ba x ba 
square {(x,y) £ Z 2 |0 < x,y < ba}. The advantage of looking at the larger square 
will become apparent when we turn to the discrete fourier transform, and will be 
summing over all the ath roots of unity. 

This larger square can be partitioned into b 2 squares of sidelength a. Because 
the solutions to xy — N mod a repeat mod a, we can count each c B {N,a) 2 
once per ax a square, by summing c B >(N,a) 2 = c B (N,a) 2 over all b 2 translates 
B' = B + (no, r 2 a) of B, with <n,r 2 < b. 

On the other hand, we can also partition the ba x ba square into a 2 squares of 
sidelength b: 

(5.12) Pi = {-By }o<*,j<o-i 



with Bij given by (|5.4j) . 

Each translate of a B square, B' = B + (r\a,r2,a), is covered by at most four 
Bij E P%, and each B^ G P 2 overlaps at most four such translates of B. 
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Hence, applying the Cauchy-Schwartz inequality as before, 

(5.13) & 2 £c B (^,a) 2 = o( £ c B (N,a)A. 

BeP \BeP2 ) 

To study cb(N,o,) 2 we multiply equation (|4.4p by its conjugate, giving 
(5.14) 

c B (N,a) 2 = ± £ £ £ e(^(yi-zVV)-^(y 2 -z 2 iV) 



0<fci,fe 2 <a-l (*i.»i)6B (i2,i2)eB 

gcd(x 1 ,a) = l gcd(x 2 ,a) = l 



Next, sum over all -B^- £ P 2 , and break up each sum over (x, y) £ into a double 
sum z6 < x < (i + 1)6, jb < y < (j + 1)6, 



1 | a_1 / N 

c B (N,a) 2 = -2 E £ E e ™ - fear 2 



BeP 2 



0<fci,fc 2 <a-l \ «=0 *6<xi,x 2 <(*+l)b 

gcdfa^ ,a)=gcd(:c 2 ,a) = l 



(5.15) 

Now, the inner most sum, 
(5.16) 



E 



E E « 

> 3=0 j6<2/i,i/2<0'+1)6 



/ hyi - k 2 y 2 



( kim - k 2 y 2 
V a 



V a 



jb< Vl ,y 2 <(j+l)b 

is a product of two geometric series and equals 

e(kib/a) — 1 e{—k%b/a) — 1 



(5.17) 



e((fei - k 2 )jb/a)- 



e(ki/a) — 1 e(—k 2 /a) — 1 



We understand e ^/a)-i to ec l ua l 6 if fc = mod a. Summing (|5.17|) over < j < 



a — 1 gives 



e(k\b/ a) — 1 
e(fci /a) — 1 

otherwise 



if k\ = k 2 mod a 



(recall that we have chosen 6 so that gcd(6, a) = 1). Therefore, only the terms with 
k\ = k 2 contribute to (|5.15|) and it equals 



a— 1 / a— 1 



,£ £ E 



fc=0 \ i=0 ib<v 1} x 2 <(i+i)b 

gcdfasi ,a)=gcd(a:2 ,a)=l 



iV 



(fc(xi - x 2 )) 



e(fc6/a) - 1 



e(k/a) - 1 



The k = term gives, on separating the sum over X\ and x 2 , 

2 



(5.19) 



E 

i=0 



E i 



I «1,<!D<(«+1)6 
\ gcd(x,a) = l 



which, by ()4.7j) and using 6 ~ a 1 / 2 , equals 

(5.20) 6 2 ((/.(a)6/a + 0(a £ )) 2 = 0(<Xa) 2 )- 
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Next, we deal with the terms 1 < k < a — 1. The sum over i in (15. 18[) equals 

a-l 

(5-21) £ E A XltX2 (-Nk), 

i= ib<x 1 ,x 2 <(i+l)b 

where 

if gcd(a;iX2, a) > 1 
e(i(afi — 2f 2 )/a) otherwise. 
To analyze this sum, we use the two dimensional discrete fourier transform 

(5.22) A miim2 {t) = A XliX2 {t)e 

0<xi ,X2 <a — 1 

so that 



Aci,oc 2 (*) — 



(5.23) A XuX2 (t) 



1 



E 



.(*)e 



0<?71i ,7712 <<1 — 1 

and (|5.2ip equals, on changing order of summation, 
(5.24) 



/ miXi + m 2 x 2 
V a 



1 



E A„ 2 (-Nk) [ E e 

l 0<i<a-l ib<x 1 ,x 2 <(i+l)b 



mixi + m 2 x 2 



0<mi,m2<a-l 

The bracketed sum over i is similar to the sum over j worked out above and equals 



e(mib/ a) — l 
e(mi/ a) — 1 

otherwise. 
Therefore, (|5.24[) equals 



if 777-2 = —777,1 mod <Z 



(5.25) 



a—l 

— y A rn . a — m (y—Nk) 

n — ' 



»TI = 



e(mb/a) — 1 



e(m/a) — 1 



So, (|5.18[) . and hence (|5.15p . equals 

^ a — 1 a — 1 

(5.26) -j E ^m,o-m(-WA0 



fc=0 m=0 



e(mb/a) — 1 



e(m/a) — 1 



e(fc&/a) - 1 



But, 
(5.27) 

A m , a — m( — Nk) 



E 



(xi - x 2 ) e 



e(fc/a) - 1 



mil — 7773:2 



ft 



gcd(x 1 a;2 ,a) = l 

However, the sum on the r.h.s. is a Kloosterman sum 

Nkx + mx 



E 



0<x<a—l 
gcd(x,a) = l 



iV fcx + mx s 



(5.28) 



E 



0<a;<a-l 
gcd(a; 1 a) — 1 



S(-m,-Nk,a) 



and are known [7] [5] to satisfy the bound 

(5.29) \S(-m, -Nk, a)\ < r(o) gcd(m, fc, a) 1/2 a 1/2 = 0(a 1/2+e gcd(fc, a) 1/2 ) 
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(recall we are assuming that gcd(7V, a) = 1 so that N does not appear on the r.h.s. 
of this inequality). Applying this bound to A m>a - m {— Nk), shows that (|5.26j) is 



(5.30) 



k=l m=0 



e(mb/a) — 1 



e(m/a) — 1 



e(kb/a) 



e{k/a) - 1 



0(a) 



The 4>(a) 2 terms comes from the k = contribution, (|5.20|) . We must isolate this 
term, otherwise the estimate below will be too large. 
Separating sums gives 



(5.31) 




gcd(fc, a) 



e(kb/a) - 1 



e(k/a) - 1 




m=0 



e(mb/a) 



e(m/a) — 1 



Ha) 2 



Both sums can be bounded using the same approach as for (|4.1ip in the previous 
section, namely: combining terms k and a — k (similarly for the m sum, but taking 
the m = term alone), breaking up the sum into the terms with k < a/(irb) ~ 
a 1 / 2 /tt (respectively, m), applying inequalities (|4.14p . estimating the resulting sums, 
using b ~ a 1 / 2 , we find, for any e > 0, that (|5.31[) equals 



(5.32) 



0(&V +e ) 



We have thus estimated the sum that appears on the r.h.s. of (|5. 13|) . The sum 
that we wish to bound appears on the l.h.s. of (|5.13[) but with an extra factor of 
b 2 . Hence, dividing the above by b 2 gives 0(a 1+e ) for the sum in theorem. 

□ 



Remark: In certain cases, such as when a = p 2 , withp prime, one can improve the 
above estimate for the second moment to 0(a) by taking b = p and, for x = jp + I, 
with gcd(Z,p) = 1, using x — l 2 (l — jp). 

5.1. Running time of the variant, for 1 < U < V < N. Instead of paritioning 
the a x a square into smaller squares of sidelength b ~ a 1 / 2 , we partition it into 
rectangles R of width w < a and height h < a, where w,h 6 Z and, for convenience, 
gcd(u>,a) = gcd(/i, a) = 1. If this condition does not hold, simply increment them 
individually until it does, and, as before, this will require adding at most 0(a e ) to 
each. 

We partition the a x a square and also the larger wa x ha rectangle into smaller 
rectangles R: 

R = Rij = {(x, y) £ Z 2 \iw < x < (i + l)w, jh < y < (j + l)h} 

Q = {Rij} 0<i<a/m-l 
0<j<a/h-l 

Q2 = {Rij}o<i,j<a-l- 

As in Section 4.1, we have 
(5.33) wh c B {N,af = O I ^ c R {N,a) 2 . 

ReQ \R£Q2 J 

with wh appearing on the l.h.s. since the large wa x ha rectangle has that many 
copies of the ax a square. 
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e(mw/a) — 1 



e(m/a) — 1 



i(kh/a) - 1 



e(k/a) 



2 



Using the discrete fourier transform, as before, 
(5.34) 

a— 1 a — 1 

£ c fl (iV,a) 2 = ^^]r |S(-m,-7Vfc,a)| 2 

i?6Q 2 fc=0m=0 

This useful identity expresses the second moment for the larger too x ha rectangle 
as a sum involving Kloosterman sums. 

The k = term can be estimated as in (|5 . 20[) and asymptotically equals 

(5.35) 

For the k > 1 terms, we use bound (|5.29| to estimate the Kloosterman sums and 
separate the double sum above to get a contribution of 



(5.36) O - Vgcd(fc,a) 



e(kh/a) — 1 



e(k/a) - 1 




e(mw/a) 



e(m/a) 



The first sum is estimated to equal 0(r(a)ah) while the second sum is 0(aw), 
giving, for k > 1 a contribution of 

(5.37) 0{a 1+€ wh) 

for any e > 0. Putting (|5.35[) and (]5 .37[) together, then dividing the l.h.s. of (|5.33|) 

by wh gives the following estimate for the second moment: 

Theorem 5.2. Let 1 < w, h < a, with gcd(w, a) = gcd(h,a) = 1. Then, using the 
notation above, we have an estimate for the second moment that depends on the 
area wh of the rectangles R: 

\0{a 1+t ) ifwh = 0{a 1+e ), 
I 0(whcj)(a) 2 / a 2 ) if wh » a x for some A > 1. 



(5.38) ^c fl (7V,a) 2 
ReQ 



Remark: if gcd(w, a) = gcd(ft, a) = 1 does not hold, one can increment them by 
at most 0{a e ) until this condition holds. So long &s w, h >> a e to begin with, the 
estimates in the theorem are unaffacted. 

In Section 1.2, our choice of w and h has wh — 0(a), and the estimate for the 
second moment is then (9(a 1+e ), as in the previous section. 

The second estimate of the theorem (not relevant for our particular application) , 
0(wh(f>(a)' 2 /a 2 ), can probably be turned into an asymptotic formula and a central 
limit theorem proven. This will remain an inquiry for the future. 

5.1.1. Acknowledgements. I wish to thank Andrew Granville, Carl Pomerance, and 
Matthew Young for helpful feedback. 
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